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Abstract 

A vital constituent of a virus is its protein shell, called the viral capsid, that 
encapsulates and hence provides protection for the viral genome. Assembly models 
are developed for viral capsids built from protein building blocks that can assume 
different local bonding structures in the capsid. This situation occurs, for example, 
for viruses in the family of Papovaviridae, which are linked to cancer and are hence 
of particular interest for the health sector. More specifically, the viral capsids of 
the (pseudo-) T — 7 particles in this family consist of pentamers that exhibit two 
different types of bonding structures. While this scenario cannot be described 
mathematically in terms of Caspar-Klug Theory (Caspar and Klug 1962), it can 
be modelled via tiling theory (Twarock 2004). The latter is used to encode the 
local bonding environment of the building blocks in a combinatorial structure, 
called the assembly tree, which is a basic ingredient in the derivation of assembly 
models for Papovaviridae along the lines of the equilibrium approach of Zlotnick 
(Zlotnick 1994). A phase space formalism is introduced to characterize the changes 
in the assembly pathways and intermediates triggered by the variations in the 
association energies characterizing the bonds between the building blocks in the 
capsid. Furthermore, the assembly pathways and concentrations of the statistically 
dominant assembly intermediates are determined. The example of Simian Virus 
40 is discussed in detail. 
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1 Introduction 



Papovaviridae are of particular interest for the medical sector because they contain 
tumour-causing viruses. The distinctive feature of viruses in this family is the fact that 
the surface lattices of the icosahedral viral capsids, that is, the protein shells encapsu- 
lating the viral genome, are composed of clusters of five proteins only, called pentamers. 
This structural peculiarity distinguishes them from viruses in other families, and is the 
reason why the Caspar-Klug theory (Caspar and Klug 1962), which explains the struc- 
ture of most viruses with overall icosahedral symmetry, cannot be applied to viruses 
in this family. For examples see (Rayment et al. 1982), (Liddington et al. 1991) and 
(Casjens 1985). 

Virus Tiling Theory (VTT) as introduced in (Twarock 2004, 2005) provides mathemat- 
ical tools appropriate to the description of the capsid structure of Papovaviridae while 
still reproducing the tessellations (tilings) relevant to the description of the viruses in the 
Caspar-Klug classification. Its predictive power is significantly enhanced, in comparison 
with the Caspar-Klug theory, through its ability to locate the bonds between protein 
subunits, and not only the location of the protein subunits themselves. The VTT ap- 
proach both generalises and extends the Caspar-Klug theory. It is proving a versatile 
and powerful ally in tackling the puzzles of modern structural virology, one of them being 
the mechanisms of virus capsid assembly which we investigate in this paper. 

Virus capsid assembly has been considered from various points of view in the literature. 
Besides the approach of molecular dynamics (Rapaport et al. 1999) and of thermody- 
namics as self-organisation of disks on a sphere (Bruinsma et al. 2003), combinatorial 
optimisation studies have been performed in (Reddy et al. 1998) and (Horton and Lewis 
1992). Closest to our standpoint are the local rules approach (Berger et al. 1994), where 
assembly is constrained by a set of local rules that indicate possible locally allowed config- 
urations for the protein subunits, and an equilibrium approach due to Zlotnick (Endres 
and Zlotnick 2002), where kinetic rate equations determine the concentrations of the 
assembly intermediates. 

A distinctive feature of the Zlotnick approach is the fact that the local bonding structure 
of the capsomers (regular grouping of capsid proteins) is noi taken into account when con- 
structing the assembly models. While this is an appropriate simplification for the viruses 
studied by Zlotnick, it does not accurately refiect the fact that the capsids of Papovaviri- 
dae are formed from pentamers that differ by the structure of the inter-subunit bonds 
that surround them. Here, we take advantage of the tiling approach, which provides 
mathematical tools for the modelling of the bonding structure of proteins by combining 
the information on local environments around pentamers with the equilibrium approach 
of Zlotnick. A complication when studying assembly models with several basic types of 
capsomers (building blocks) and/or several types of bonds along the above lines is the 
fact that distinct configurations may be energetically preferred when attaching a further 
building block. The assembly process must therefore be represented as a tree of assembly 
pathways rather than as a single assembly pathway as in Zlotnick's simplified model. We 
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develop a method to characterize the structure of the assembly trees as a function of the 
association energies of the bonds, and show that only a subset of statistically dominant 
intermediates needs to be taken into account when analysing the assembly process. 

The paper is organised as follows. After a review of the tiling model for (pseudo-) T = 7 
capsids in the family of Papovaviridae in Section [21 and a review of Zlotnick's assembly 
model in Section [HI we introduce the assembly models for Papovaviridae in Sectional In 
particular, we derive the succession of assembly intermediates based on a set of building 
blocks and rules for their association, and encode this information in an assembly tree. 
We use the latter to derive equations for the relative concentrations of the intermediates 
in solution in an in vitro experiment. In Section El we apply this set-up to Simian Virus 
40 (SV40) and make predictions about the concentrations of the assembly intermediates 
which may be tested experimentally. 

2 Modelling the bonding structure of (pseudo-) 
T = 7 capsids in the family of Papovaviridae 

While (pseudo-) T = 7 capsids in the family of Papovaviridae fall out of the orbit of 
Caspar-Klug Theory, it has been shown in (Twarock 2004) that their surface structure 
can be modelled via tiling theory. The viral capsids are tessellated in terms of a set of 
building blocks called tiles and the tilings encode interactions between protein subunits, 
which are marked schematically as dots on the tiles. In this way, both the locations 
of the protein subunits and of the inter-subunit bonds can be read off from the tiling. 
The tiles for Papovaviridae are shown in Fig. The tile on the left is called a kite and 




Figure 1: Tiles for spherical capsids in the family of Papovaviridae. 

represents a trimer interaction between the three protein subunits that are represented 
as dots; the tile on the right is called a rhomb and corresponds to a dimer interaction 
between the two subunits on the tile. Trimer and dimer interactions take place via an 
exchange of C-Terminal arms between the subunits. 

The corresponding tiling is shown in Fig. I2| superimposed on experimental data from 
(Liddington et al. 1991). It yields the surface lattice of the (pseudo-) T = 7 capsids in 
the family of Papovaviridae, which are built from 360 protein subunits organised in 72 
pentamers. The dots, which indicate the locations of the protein subunits with respect 
to the shapes of the tiles, are located at angles of size := so that the corresponding 
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Figure 2: Tiling representing the surface lattice of spherical tilings in the family of Pa- 
povaviridae. 

vertices mark the locations of pentamers and the structure of the tiles fixes their relative 
orientations. 

The locations of the inter-subunit bonds can be read off from the tiling based on the 
interpretation of the tiles. They have been superimposed schematically as spiral arms 
on the tiling in Fig. [21 and coincide with the experimentally found bonding structure, for 
example as observed for papillomavirus in (Modis et al. 2002). 

The tiling shows that there are two different types of pentamers in the capsid, distin- 
guished by their local bonding structure, i.e. by the locations and types of the inter- 
subunit bonds that surround them. This information is encoded in the two different local 
environments of vertices marking pentamers in the tiling (vertex-stars of the tiling), see 
Fig. El In order to avoid overlaps of the building blocks, and indeed to base assembly on 



Figure 3: The two vertex stars in the tiling. 

the pentamers only rather than pentamers plus surrounding subunits, we choose to work 
instead with the hexagons and pentagons that are obtained from the building blocks in 
Fig. El by cutting all bonds perpendicularly through their middle as shown in Fig. El 
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Figure 4: The building blocks for assembly of vertex-star models. 

3 Zlotnick's model 

In (Zlotnick 1994), viral capsid assembly is described as a series of equilibrium reactions 
for a small plant virus formed from 12 pentamers. All pentamers are identical and are 
modelled geometrically as regular pentagons. This choice of building block is justified 
by the experimental observation that during assembly pentamers form first and then 
combine to produce the full capsid. 

3.1 Assumptions of the model 

For simplicity, the model assumes that the final capsid has ideal dodecahedral geometry 
and that assembly takes place by the association of a single subunit at a time to an 
existing intermediate, rather than by the association of intermediates. 

A crucial hypothesis is that all 30 edge to edge contacts between pentamers (located on 
the dodecahedral 2-fold axes) are identical and score the same per-contact association 
energy AG contact- This energy is used to define the association constant for a single inter- 
subunit contact, K contacts through the thermodynamical Arrhenius relation AG contact = 
— RT In K contact) with R the gas constant and T the temperature in Kelvin, and can 
therefore be interpreted as a free energy. 

The model assumes that AG contact << kcal mol~^, so that the most stable intermediates 
correspond to those with the greatest number of inter-subunit contacts. 

3.2 The model for the concentrations of the intermediates 

Based on the above assumptions, Zlotnick's model determines the concentrations of the 
most stable intermediates during assembly. In particular, it determines the assembly 
pathway for the virus and predicts the relative concentrations of the intermediates on 
this pathway. Since the pathway is linear in this model (i.e. does not have any branches 
because any further step is uniquely determined by the assumptions), it is possible to 
label the intermediates in consecutive order by their location on the pathway. In par- 
ticular, we denote the n-th assembly intermediate (also referred to as species) on the 
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pathway as (n) and its concentration as [n]. 

For each iteration step n, denotes the association constant of the intermediate (n). 
It is defined as 

[n] 

and can be decomposed into Kn = SaSnK'^^ where the individual factors have the fol- 
lowing interpretation: 



• Let Osymi"^) denote the order of the discrete symmetry group that corresponds to 
the geometrical shape of intermediate (n). Then 

• Sd = Osym(l) corresponds to the order of the discrete rotational symmetry group 
of the shape of the incoming subunit, i.e. the pentagon. It takes care of the 
geometrical degeneracy of the incoming subunit. 

• K^^ is a function of the number of contacts formed in the transition from interme- 
diate (n — 1) to intermediate (n) and their free energies. It is given as 

K = exp (^.iM^EH^!^^ = {K,ontactr^-\ (3) 

where 7(n) is the number of inter-subunit bonds formed. 



The assembly pathway is shown in Table El in Section Al of the Appendix. The concen- 
trations [n] of the assembly intermediates on the pathway can be obtained from Eq. (P) 
as follows: 

[n] = SdS^K[n-l][l] 

= (llS,S,K^[ir. (4) 



3.3 Interpretation of the model 

Eq. is crucial in Zlotnick's analysis of the concentrations of intermediates. It clearly 
shows how the concentrations depend on two parameters which can be tuned experimen- 
tally at room temperature: one is the free energy of a single contact AG contacts and the 
other is the concentration, [1], of basic subunits. 

For the choice AG contact << kcal mol~^, one obtains fully disassembled subunits and 
intact dodecahedra, with rare occurrence of assembly intermediates. This property of the 
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model is illustrated in Table by tuning the second parameter, [1], to three significant 
values. 

The first, pseudo-critical^ value^ is the concentration of basic subunits needed to ensure 
that the concentration of intact dodecahedra at equilibrium, calculated with Eq. 
when n = 12, namely 
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[12] = {l[S,S^,)Kl][l] 
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^1=2 

^11 1 f ^^^G contact \ r-i]12 



5"3^exp(- -— — )[l]^^ (5) 



satisfies the constraint 



[12] = [1] (6) 
at a fixed value of AGcontact (negative) and T. 

Eqs © and © imply that the overall subunit - dodecahedron equilibrium is described 

by 

30Aa..,„.,-i?rin^ ^ ^^^^^ ^ ^^^^^ ^ ^^^^^ I 

/(^dapp is therefore the dissociation constant for a single contact in the final capsid when 
the concentration of incoming subunits and final capsid coincide, i.e. one has 

K^^^^ = [1] = [12]. (8) 

The two other chosen values in (Zlotnick 1994) are [1] = |i^dapp [1] = 2K^^^^. 

The concentrations of [ll in Table^were chosen to be ^Ka , Ka and 2Ka calculated 

L J ^ 2 ^app ' ^app ^app 

from Eq. ([7j) with AGcontact = —2.72 kcal mol"-*^, R = 1.987cal K"-*^ mol"-*^ at room 
temperature (T = 298X). 



4 Generalised assembly models and assembly trees 

In this section, a generalisation of Zlotnick's model to multiple building blocks and types 
of inter-subunit bonds is introduced. It is considered in the context of assembly rules 
that do not specify a single pathway, but instead lead to an assembly tree that devel- 
ops several possible pathways. It is shown how the recursion relations for intermediate 
concentrations can be evaluated on the basis of information encoded in the assembly 
tree. 

^Following (Johnson et al. 2005) and (Tanford 1980) we use the terminology pseudo- critical^ because 
subunits cannot freely equilibrate between the phases. 
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Species 


Concentrations (in molar units M) 


1 


0.44 * 10-6 


0.88 * 10-6 


1.8* 10-6 


2 


2 * 10-1° 


1 * 10-9 


5 * 10-9 


3 


4 * 10-12 


3 * 10-11 


2 * 10-1° 


4 


1 * 10-13 


2 * 10-12 


3 * 10-11 


5 


5 * 10-1^ 


2 * 10-13 


5 * 10-12 


6 


2 * 10-1^ 


1 * 10-13 


1 * 10-11 


7 


2 * 10-"^ 


3 * 10-14 


4* 10-12 


8 


2 * 10-"^ 


7* 10-14 


2* 10-11 


9 


6* 10-"^ 


2 * 10-13 


1 * 10-1° 


10 


1 * 10-1^ 


1 * 10-12 


1 * 10-9 


11 


1 * 10-13 


2 * 10-1° 


5 * 10-^ 


12 


2 * 10-1° 


0.88 * 10-6 


3.6* 10-3 



Table 1: The concentrations of assembly intermediates at three concentrations of basic 
subunit [1] (Zlotnick 1994). 



4.1 Generalisation of Zlotnick's model to N types of building 
blocks and k types of inter-subunit bonds 

We generalise the equations for the assembly intermediates in Zlotnick's assembly model 
to the case of different incoming subunits 1^, i = 1, A^. In this case, the association 
constant of the intermediate (n) is given by, 

instead of Eq. (QJ). In the above expression, x{n) corresponds to the incoming subunit 
selected among the possible ones for addition in iteration step n, so that Si^x{n) = 1 
if i = and vanishes otherwise. This formula thus describes the transition from 

a specific intermediate (n — 1) to a specific intermediate (n) via attachment of one of 
the possible subunits. We have included it for completeness as it is relevant for viral 
capsids whose building blocks are different in solution (such as CCMV, which is formed 
from dimers and pentamers of dimers (Johnson et al., 2005)). In contrast, the SV40 
virus studied in this paper is formed from building blocks that are identical in solution 
(pentamers) but are distinguished by their local environment of bonds once they have 
formed a capsid. 

If di denotes the order of the discrete rotational symmetry group associated with the 
shape of subunit then dx{n) encodes the geometric degeneracy of the incoming subunit 
at iteration step n. If there are k different types of inter-subunit bonds occurring in the 
final capsid, each with free energy AGj, (j = 1, . . . , fc), then one has 

K = exp(- ^^-^^^^ M , (10) 
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where aj{n) denotes the number of bonds of type j formed at step n with free energy 
AGj. Note that during the transition from intermediate (n — 1) to intermediate (n) only 
a subset of the k different types of inter-subunit bonds is formed in most cases, and hence 
some (but not aU) of the aj{n) may be zero. As before, Sn is defined as the ratio of the 
order of the discrete rotational symmetry group associated with the geometrical shape 
of assembly intermediate (n — 1) and the order of the symmetry group corresponding to 
(n). 

Eq. © provides an expression for the concentration [n] of intermediate (n) , in terms of 
[n — 1], namely, 

[n] = n{n)[n-l], (11) 

where 




n{n) = d^^n) Sn exp |^ ^- — j 2^ [1^] (12) 

Formula ([T^ generalises Eq. in Zlotnick's model. In order to express [n] in terms 
of the concentrations of the incoming building blocks, that is [1^], Eq. ([TT|) has to be 
applied recursively. For this, information on all possible assembly pathways is needed as 
an input, because it determines the factor Q{n) for each iteration step n. This information 
is encoded in the assembly tree as discussed in the following subsection. 



4.2 Assembly trees 

As already stressed in the introduction, a common feature of assembly models for Pa- 
povaviridae is the occurrence of a number of assembly pathways rather than a single 
one as in Zlotnick's model of Section El The collection of assembly pathways and their 
relations are cast in an assembly tree. Evaluating the law of mass action as in Eq. 
for such assembly trees is very involved, and it is the purpose of this section to construct 
associated linear ('branchless') trees (see Fig. on which Zlotnick's formula can be 
used. 

We start by setting up some terminology. An assembly tree is a directed graph in which 
a node represents an assembly intermediate; a link between any two nodes is drawn if the 
intermediates corresponding to these nodes are related to each other by attachment or 
detachment of a single basic building block. A path in the assembly tree is a connected 
subset of nodes and links such that each node has at most one incoming and one outgoing 
link. A node is called primary (and the corresponding assembly intermediate a primary 
intermediate) if it is located on all paths in the assembly tree. A closed bundle denotes 
the collection of paths in the assembly tree between two primary nodes. It is called 
primary if it is located between two consecutive primary nodes. For a concrete example 
illustrating this terminology, see Fig. El 

Since primary intermediates, by definition, appear on each path in the assembly tree, 
they occur with a higher frequency than the other intermediates. Therefore, we focus 
here on the concentrations of the primary intermediates only. 
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Let (n — s) and (n) be two consecutive primary nodes in the assembly tree. Then the 
factor Q{n^s) with [n] = Q{n^s)[n — s] is caUed the transition factor of the primary 
bundle between these nodes. It is the aim in the remainder of this section to derive a 
closed formula for the transition factor between any consecutive primary nodes in an 
assembly tree. 

We start with the following result. 

Lemma 4.1 Along each path in a closed primary bundle, the same number of each type 
of incoming subunit is added, and the same number of inter-subunit edge to edge bonds 
is formed. 

Proof: All of the paths in a closed primary bundle originate from, and terminate at, 
a primary node. Since these nodes have a well defined, unique structure in terms of the 
number of subunits and their location, the same number and types of subunits have been 
added on each path in the bundle, and the same number of inter-subunit bonds have 
been formed. □ 

This implies that there are the same number of nodes on each path in a closed primary 
bundle. Therefore, one can consider without loss of generality the case of closed primary 
bundles consisting of p paths with an equal number of nodes each. 

In order to formulate a recursion relation for the transition between two consecutive 
primary nodes (n — s) and (n), one must introduce a labelling system that uniquely 
specifies each assembly intermediate according to the path chosen from the primary 
node (n — s) to form it. We represent each path between (n — s) and (n) as a unique 
5-dimensional vector (ij), I < j < whose j-th component specifies the branch taken at 
the intermediate node {n — s + j — 1). For any value of s and j in the range I < j < let 
(n — s] ii, denote the assembly intermediate obtained from (n — s) after addition 

of j building blocks along the path labelled by the indices ii, These encode the 

branchings chosen between the primary node (n — s) and the node {n — s + j). 

If 6((n — s)) (resp. b{{n — s; ii, .., ij)) ) denotes the number of outgoing branches at node 
{n — s) (resp. at node {n — s; ii, ij) ), and ifpi^{{n — s)) (resp. Pi.{{n — s] ii, ij)), 2 < 
j < s) is the probability that the branch labelled ii (resp. ij) is chosen at node (n — s) 
(resp. (n — s] Zi, ), the announced recursion relation for the transition between 

two consecutive primary nodes (n — s) and (n) is given by. 



n 



b((n-s)) b((n-s]ii)) b((n-s]ii,...,is-i)) 

= {11 E ••• E P^A{n-s))^{n-s + l),, 



22 = 1 



xPi2iin-s; ii))Q{n- s + 2)i^ x ... x pi^{{n - s; ii, . . . ,is-i)) Q{n)i 
x[n-s], (13) 
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with 



and 



biin-s)) 
ii=l 

Pi^{{n- s] = 1, j e{2,...,s}. 



(14) 



(15) 



Note that the factors Q{n — s + 1 < j < s depend on the geometric degeneracy of 
the incoming subunit and on the association energy of each bond formed at each step 
the symmetry of the intermediate {n — s] ii, . . . ij), and the concentrations of the basic 
subunits. The foUowing example iUustrates the general formalism just introduced. 

Fig. Elshows a primary closed bundle in an assembly tree, i.e. of the collection of assembly 
pathways between two consecutive primary nodes (n — 4) and (n). With respect to the 




(n-4; 1, 1) 



(n-4;_l, 2) 
(n-4; 2, 1) 



(n-4; 2, 2) 



(n-4; 2, 3) 









(n-4; 1, 


1. 


1) 


(n-4; 1, 


2. 


1) 


(n-4; 2, 


1. 


1) 



(n-4; 2. 2. 1) 
(n-4; 2. 3. 1) 



(n) 



(n-5) 



(n-4) 



(n) 



Figure 5: Example of primary dosed bundle in an assembly tree. 

notations used in (fT!^ this corresponds to the situation where s = 4. The number 
of branches stemming from the node (n — 4) is 6((n — 4)) = 2, hence ii G {1,2}. 
Moreover, 5((n — 4; 1)) = 2 and 5((n — 4; 2)) = 3. The probabihties marked on the 
hnks are chosen such that the relations (|T^ and (fTH|) are fulfiUed; any other choice that 
satisfies these equations would also have been possible. Here, one has pi{{n — 4)) = 1/3, 
P2((n - 4)) = 2/3, pi((n - 4; 1)) = 1/6, p2((n - 4; 1)) = 5/6, pi((n - 4; 2)) = 2/3, 
P2{{'^ — 4; 2)) = P3{{n — 4; 2)) = 1/6 etc. Theorem 14.21 below shows that, from the 
point of view of computing the concentration of the assembly intermediate (n) from the 
assembly intermediate (n — 4) via the law of mass action, the result is the same as using 
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the reduced tree (single pathway) shown on the right in Fig. El where the dashed hne 
corresponds to any of the pathways in the bundle without statistical factor correction. 

Let us emphasize that in principle the concentrations of all intermediates can be calcu- 
lated using the law of mass action, but corrective factors based on the probabilities in Eqs 
ffTlj) and ffTHj) appear. For example, the assembly intermediate denoted as (n — 4; 2, 2) in 
Fig. Elhas probability ]92((n — 4)) xp2{{n — 4:] 2)) = 2/3 x 1/6 = 1/9 of being formed from 
the primary node (n — 4). In contrast to this, no corrective factors appear for primary 
intermediates as we show in the theorem below. 

Theorem 4.2 For each closed primary bundle between two consecutive primary nodes 
{n — s) and [n), the concentration [n] of the primary intermediate (n) is given in terms 
of the concentration [n — s] of the primary intermediate (n — s) by the following equation, 

[n] = Q{n- s + l)Q{n -s + 2)...Q{n- l)Q{n) [n - s] 

= (^n^(n-j^ [n-.] (16) 

with Q{n — j) as in Eq. / fT^) . 



Proof: Due to Lemma 14.11 the same subunits are added on each path in a closed 
primary bundle (though possibly in a different order). Hence, the transition factor 

must be proportional to ^Iljlo ~ i)) ? i-^- ^he transition factor introduced at the 
beginning of the present section is 

Q(n,s) = Cl \mn-j) \ (17) 




with a constant C . Moreover, due to the normalisation of the probabilities, one has 
C = 1. □ 



Note that Eq. (jl6j] resembles Eq. ((2j) in Zlotnick's model. However, while Zlotnick's 
model considers all intermediates, this formula is defined for concentrations of primary 
intermediates only. This choice is justified by the fact that primary intermediates are 
statistically dominant because they appear on all paths in the assembly tree. 

Using the recursion relation of Theorem 14.21 the concentration of the primary interme- 
diate (n) can be expressed in terms of the concentrations of the incoming subunits using 
Eq. f[T?)j) . The formula depends on the number of incoming subunits li,...,lAr, their 
geometrical degeneracies rfi, ...,rfAr and the total number of inter-subunit bonds k with 
free energies AGi, AG^^. One obtains 
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with ^{j) as in Eq. fjl2j] . In particular, for the concentration of the final capsid, 
[ Final Capsid ], formed after u steps and after bonds of type j being added, 

one obtains: 

[ Final Capsid ] = ^{d, j ^ exp (^-^^^^^^^ • (19) 

Here r]{i) is the number of subunits 1^ in the final capsid and s is the order of the discrete 
rotational symmetry group of the final capsid. 



5 Assembly models for Papovaviridae 

Here the set-up of Sectional is applied to (pseudo-) T = 7 viruses in the family of Papo- 
vaviridae. As discussed in Sectional these viruses have capsids formed from 360 proteins 
which organise themselves in two different types of pentamers that are distinguished by 
their local bonding environments. These, together with the locations and relative orien- 
tations of the pentamers on the capsid, can be modelled via tiling theory. As can be seen 
from the tiling in Fig. [21 there are two different types of basic shapes, a rhomb and a 
kite, each representing a different type of inter-subunit bond. Hence, the tiling approach 
provides a natural way to model different types of local environments mathematically: 
the pentamers located at the twelve global five-fold axes of the tiling are surrounded by 
five kite tiles, and the sixty other pentamers are surrounded by three rhomb tiles and 
two kite tiles each. 

The two different types of pentamers are modelled as two different vertex configurations, 
called vertex-stars in tiling theory (see Fig. Ej). The corresponding model is called the 
vertex-star model and is discussed in Subsection 15.11 

In Subsection 15.21 we apply the vertex-star model to SV40 and use it to compute the 
concentrations of the primary assembly intermediates. 



5.1 The vertex-star model 

This model is based on the assumption that all pentamers form first in solution as 
supported by experimental evidence (Kanesashi et al. 2003) and then assemble to build 
the capsid. Since the pentamers acquire their diversity only after they are bound in 
the caspid, we do not distinguish between the two types of building blocks of Fig. 0] in 
solution. The vertex star model generalizes Zlotnick's model in two ways. Although there 
is only one type (A^ = 1) of incoming subunits, namely the pentamers, there are k = 3 
types of intersubunit bonds and different local bonding environments for the pentamers. 

We now describe these environments in some detail. While the bonds on the five edges 
of the pentagonal building block associated with the pentamer on the right in Fig. ^ are 
all identical, they differ for the hexagonal building block associated with the pentamer 
on the left in that figure. The corresponding association energies are labelled as follows: 
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• We denote the association energy corresponding to a single C-terminal arm in a 
trimer (represented by a kite in the tihng model) as a. As such, all five edges of 
the pentagonal building block (corresponding to the pentamers at the 5-fold axes) 
have an association energy of 2a. It occurs once on the hexagonal shape, with its 
neighbouring bonds having an association constant a, corresponding to a single 
C-terminal arm each. 

• b labels the association energy related to quasi-dimer bonds. These are located 
around the 3-fold axes of the tiling and are shown as rhombs with red and blue 
decorations in Fig. [21 

• c labels the association energy corresponding to strict dimer bonds along the global 
2-fold axes of the tiling. These correspond to the rhombs with yellow decorations 
in Fig. El 

Note that the distinction between the association energies for bonds on local and global 2- 
fold axes is important as they are known to be different experimentally, see e.g. (Salunke 
1986), (Schwartz 2000). 

We can now revisit Fig. 0] and quantify, in terms of a, b and c, the association energies 
related to all edge-to-edge contacts of the pentagonal and hexagonal building blocks, as 
shown in Fig. [HI 



2a 




Figure 6: Local bonding environments for the pentagonal and hexagonal building blocks. 

We consider the association energies a, b and c as parameters and start by giving all 
possible increases in free energy through the addition of a new building block to an 
existing assembly intermediate. Clearly, the more contacts the incoming building block 
makes with the latter, the larger the free energy. Up to six bonds may be formed in a 
single step, and the exhaustive list of possible association energies relative to these bonds 
is as follows: 

• For single contacts: a, 2a, 6, c 

• For two contacts: 3a, a + 6, 26, 6 + c, a + c, 4a, 2a + 6, 2a + c 
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• For three contacts: 3a + 6, a + 26, 26 + c, a + 6 + c, 3a + c, 6a, 2a + 6, 2a + c, 
2a + 6 + c, 2a + 26 



• For four contacts: 3a + 26, a + 26 + c, 4a + c, 4a + 6, 8a, 2a + 6 + c, 2a + 26 + c, 2a + 26 

• For five contacts: 3a + 26 + c, 2a + 26 + c, 4a + 6 + c, 4a + 26, 10a 

• For six contacts: 4a + 26 + c 

We assume tliat tlie guiding principle for assembly is the maximisation of the free energy 
of the assembly intermediates at each step. In other words, assembly takes place in such 
a way that the total free energy associated with the newly formed bonds is maximised. 
Such optimization determines the structure of the assembly tree, which therefore depends 
on the relative values of the association energies a, 6 and c. Our aim is to depict the 
dependence of the assembly tree structure on the ratios a/c and 6/c via a 2-dimensional 
phase space whose coordinate axes represent a/c and 6/c respectively, and where every 
point represents the assembly tree corresponding to a particular choice of those ratios of 
association energies. 

The idea is to partition the phase space in disjoint domains such that points in a given 
domain represent identical assembly trees, i.e. have the same assembly intermediates 
organised on the same assembly pathways. By doing so, one can read off from the phase 
space representation what changes in a/c and 6/c affect the structure of a given assembly 
tree. For instance, if a single contact is realised at step n, the tree might be different 
if, say, a > 6 rather than a < 6, in which case one must further discuss whether 6 > 2a 
or 6 < 2a. The threshold for potential qualitative changes in the assembly tree in this 
example are therefore the equalities a = 6 and 6 = 2a. We demonstrate the geometrical 
meaning of this in Fig. [71 In this case, the insertion of a building block at position 1 



Figure 7: An assembly intermediate shown in grey with potential next sites labelled by 
letters. 

corresponds to the formation of three bonds of association energy 2a each (total energy 
of 6a) and the insertion of a building block at position 2 corresponds to the formation of 
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four bonds of association energies 2a, a, 6, b (total energy of 3a + 26). The relative values 
of a and b determine which of the two positions will be selected in order to form the next 
assembly intermediate on the pathway. 

In general, the partition of the phase space is provided by a grid of lines whose equations 
are obtained by relating pairwise the free energies listed above, namely, 

• Equations relating a and b: 

a = b^a = 2b^a = 6/3, a 6/4, a 6/8, a 6/10, a 6/6, a 6/2, a 26/3, 
a = 6/5, a = 6/7, a = 6/9, a = 26/5, a = 26/7, a = 26/9 

• Equations relating a and c: 

a = c, a = c/3,a c/4,a c/6,a c/8,a c/10,a c/2,a c/5,a c/7, 
a c/9 

• Equations relating 6 and c: 
6 = c,6 = c/2 

• Mixed cases: 

a (6 + c)/2, a (26 + c)/2, a 6 + c, a 26 + c, a b — c^a (6 — c)/3, 
a = (6-c)/4,a = c-6,a = (c-26)/4,a = (6 + c)/3,a = (26 + c)/3,a = (6-c)/2, 
a = 26-c,a = (26-c)/3,a = (26-c)/4,a = (6 + c)/4,a = (6 + c)/6,a = (6 + c)/8, 
a = (6 + c)/10,a = (c - 6)/2,a = (c - 26)/2,a = (26 + c)/4,a = (26 - c)/2, 
a = (26 + c)/6,a = (26 + c)/8,a = (26 + c)/10,a = (6 + c)/5,a = (6 + c)/7, 
a = (6 + c)/9,a = (26 + c)/5,a = (26 + c)/7,a = (26 + c)/9 

In Fig. [HI we show a region of the phase space for the case where c is normalised to 1 and 
plot y = a/c (vertical axis) against x = b/c (horizontal axis). Note that several domains 
of the phase space presented in Fig. [HI may exhibit the same assembly trees because the 
lines only mark potential changes. It follows that in practice, the effective phase space 
has a 'lower resolution' than that of Fig. [SI 

In order to determine the location of the assembly tree of a given virus in phase space, 
we need to compute a/c and b/c. For this, we use the ratios of the association energies 
listed on the VIPER webpages, which yield, for SV40, x = 0.92 and y = OAT. Therefore, 
SV40 lies in the area bounded by the following lines: 



?/ = -, y = 2x-l and ^=3 + 6' ^^^^ 

In Section A3 of the Appendix, we discuss the qualitative behaviour of all intermediates 
located in areas of phase space adjacent to SV40. This information can be used to 
understand how enforcing quantitative changes to the association energies of SV40 can 
lead to qualitatively different behaviours. 

In Section [^1 we discuss the case of SV40 in detail, and compute the concentrations of 
the primary assembly intermediates. 
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5.2 Application to SV40 



In this section we apply the vertex-star model to the case of SV40. In order to compute 
the concentrations of the assembly intermediates based on Eq. f[TSj) . one needs infor- 
mation on the assembly tree and hence on a generalisation of Table ^ to the primary 
intermediates in our model. Embedding the complete capsid into the plane provides an 
outline of locations of the tiles in the final capsid as shown in Fig. [HI Capsid assembly 
can hence be displayed graphically by filling the blank spaces. In Table [3 Section A3 of 
the Appendix, where all assembly intermediates are listed for SV40, this will be used in 
order to display the structure of the primary assembly intermediates graphically. 

Based on this, we compute the concentration of the final capsid in terms of the concen- 
centration of the incoming subunits. Recall that pentamers are assumed to be indis- 
tinguishable in solution and correspond to clusters of five proteins, with five C-terminal 
arms that are not in a particular configuration, but attain a configuration corresponding 
to one of the two vertex stars only after attachment to the capsid. Therefore, we denote 
the concentration of the indistinguishable pentamers in solution as [1], and use the fact 
that their geometrical degeneracy is 5. Hence 

IT,- ^n /.rin72 1 f 180a + 606 + 30c\ ^^^^ 
[ Fmal Capsid ] = (5[1])''— exp f — j . (21) 

The factor 60 in the denominator corresponds to the order of the symmetry group of the 
final capsid as can be seen by comparison with Eq. (|2I), noting that 



The factors ^{P) for the primary intermediates can be obtained as follows: 

n(P)H5|lir'exp(- °'^>''^^<7^^<^'- ). (23) 

where e{P) corresponds to the number of building blocks added in transition from pri- 
mary assembly intermediate P — 1 to primary assembly intermediate P, and Qf(P), 
respectively f3{P) and 7(P), correspond to the number of a, respectively b and c bonds, 
added during that iteration step. The corresponding values for SV40 can be read off 
from Table 01 Table El and Table Section A2 of the Appendix. 

We compute the concentrations of the assembly intermediates based on starting val- 
ues given by the dissociation constants as in Subsection 13.31 However, since there are 
different types of bonds in this model, we use the average dissociation constant here, 
which will also be denoted as i^dapp- Table El we give the concentrations of the pri- 
mary assembly intermediates for SV40 for the starting values |/Cdapp5 ^dapp 2K^^^^^ 
where A^dapp is calculated at room temperature with free energies ^ a = — 0.7kcalmol~^, 
b = — 1.37kcalmol~^ and c — 1.49kcalmol~^. 



^In contrast to the ratios x and y relevant to the analysis of the phase space in Section 5.1, the 
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Species 


Concentrations (in molar units M) 


1 


1.822* 10"^ 3.643* 10"^ 7.287* 10"^ 


2 


6.751 * 10-6 2.700 * 10"^ 1.080 * 10"^ 


3 


5.212*10"^ 4.170*10-6 3.336* 10-5 


4 


5.372* 10-8 8.594* 10-^ 1.375 * 10-5 


5 


8.261 * lO-'^ 2.644 * 10-^ 8.459 * 10-^ 


8 


1.690* 10-11 4.327 * 10-'5 1.108 * 10-^ 


9 


2.600 * 10-12 1.331 * 10-9 6.815 * 10-^ 


10 


6.685 * 10-1^ 6.845 * 10-^° 7.010 * 10-^ 


11 


6.464 * 10-1^ 1.324 * 10-i° 2.711 * 10-^ 


12 


2.654* 10-1^ 1.087* 10-1° 4.453* 10-^ 


68 


1.282 * 10-28 3.783* 10-8 1.117*10+1^ 


69 


1.405 * 10-28 8.295* 10-8 4.897 * lO+i^ 


70 


4.696 * 10-28 5.544 * 10-^ 6.545 * lO+i^ 


71 


1.634* 10-26 3.859* 10-5 9.111 * lO+i^ 


72 


7.715 * 10-26 3.643 * 10-^ 1.720 * lO+i^ 



Table 2: Concentrations of species at primary nodes based on ^K^^^^, ^dapp 2Xdapp 
respectively. 

As in Zlotnick's case we observe that pentameric subunits and final capsids are the 
dominant species. 



6 Conclusion and outlook 

We have extended Zlotnick's assembly model to the scenario where the building blocks 
(capsomers) assume different local bonding configurations in the capsid. This has led 
to the occurrence of assembly trees rather than a linear pathway of assembly. We have 
characterised the structural dependence of these assembly trees on the ratios of the as- 
sociation energies via phase space portraits (Subsection [OJ. This approach has allowed 
us to determine the structure of the statistically dominant intermediates, the primary 
nodes introduced in Subsection 14. 21 A formula has been derived fEq. fTKj) . which provides 
a mean to calculate concentrations of primary assembly intermediates as a function of 
the concentration of their primary predecessor in the assembly pathway. Although it 
is unnecessarily complicated as long as one is interested in obtaining the transition fac- 
tors between primary intermediates (the latter are independent of the path chosen as 

absolute values of the association energies on the VIPER webpages are higher than the ones usually 
observed in experiments. This discrepancy may be explained by the fact that the data on the VIPER 
webpages have been obtained via computations based on the scenario of rigid building blocks, which 
may be too strong an assumption (Zlotnick, private communication). Therefore we are using the range 
of values suggested in (Keef et al. 2005) together with the relative ratios stemming from the VIPER 
data. 
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shown in Theorem 14. 2j] . the formula encodes statistical factors relevant for the calcula- 
tion of concentrations of assembly intermediates within a bundle as demonstrated for the 
example in Fig. El 

While the set-up discussed here lends itself, by construction, to an equilibrium analysis 
of the assembly process, it can provide clues only on possible assembly kinetics. This 
issue is discussed in detail in (Keef et al. 2005) where the probability distribution of 
all intermediates is computed explicitly via a master equation approach and is used to 
obtain information on the putative pathways of assembly. 

The theory presented here opens up various avenues for applications. For example, it 
provides a method for determining experimentally the values of the association constants 
by measuring the equilibrium concentrations of the basic building blocks (pentamers). 
In particular, by choosing [ Final Capsid ] = [1] as in (Zlotnick 1994), yields the 
relation 

where hi = 180 + 60^ + 30^ is given in terms of the ratios of the association energies 
b and a, as well as c and a. As before, assuming that the relative values (ratios) of 
the association energies given on the VIPER webpage are a good approximation (while 
their absolute values are too large), as discussed earlier, these can be used to obtain 
Hi ~ 360.8571. Hence the association energy a is given by 

«([1U = — (25) 

and can be determined via an experimental measurement of the equilibrium concentra- 
tions [l]g^ of the pentamers. 

Moreover, the phase space analysis introduced here provides a tool to control changes 
in the structure of assembly trees, and hence of assembly intermediates and pathways of 
assembly, as a function of variations in the association energies. It can also be potentially 
used in conjunction with a more general set-up, where different types of capsids are ob- 
tained from the protein building blocks after engineering of changes in their polypeptide 
chain (see for example (Johnson 2005)). In such a setting, the phase space analysis would 
give a quantitative prediction on the association energies needed to trigger the desired 
outcome, and would hence provide guidance when triggering changes in the polypep- 
tide chains. Our approach could therefore lead to applications in the engineering of 
nanoparticles and nano containers. 
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Appendix 

Al: The assembly tree in Zlotnick's model 

The assembly tree consists of a single pathway, and the corresponding results are given 
in Table El adapted from (Zlotnick, 1994). 
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Table 3: Assembly intermediates and factors describing their assembly^ based on (Zlot- 
nick, 1994)' oo 



A2: The assembly tree of primary intermediates in the vertex- 
star model 

As discussed in Section the set of primary assembly intermediates depends on the 
association constants. The sequence of successive primary assembly intermediates, the 
building blocks added between the primary intermediates (called S and P, respectively), 
the number of bonds a, b and c added, as well as the order of the discrete rotational 
symmetries of the assembly intermediates are shown in the three tables below for SV40. 



23 





A/ToHpI 




l\t;W ±JUiiLlo 


^ sym\' ^ J 












1 

± 




Q 




1 

± 


9 


tCC;:;> 


s 




9 


Q 
o 




o 




1 
± 


A 




p 




1 


'J 




s 


2/) 


1 












8 




P+2S 


8a + b + c 


1 



Table 4: Assembly intermediates and factors describing their assembly. 
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Species 
70 


Model 


New Tiles 
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New Bonds 
3a + 6 + c 


symij^^ 

2 


71 




S 


4a + 26 


1 


72 


IMyggHMMMMMHMi v^Sm 

WnHPHnHBHnp 


S 


4a + 26 + c 


60 



Table 6: Assembly intermediates and factors describing their assembly (continued). 
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A3: Primary intermediates in phase space 



In Section A3 we explore the areas in phase space in Fig. [HI around the shaded area 
corresponding to SV40. The numbers in the first row of the table correspond to the 
numbering of areas in that figure. For each area, a representative point is chosen and 
its values for a/c, h/c and c = 1 are given in the rows below. The column below 
represents in each case the occurrence of primary intermediates and their relation to 
primary intermediates in the other (numbered) areas of phase space. The table should be 
read as follows: In each row, for a given value of A^, we enter a symbol if the corresponding 
intermediate is a primary intermediate, and leave the entry blank otherwise. For given 
A, we fill rows from left to right, starting with the heart-symbol for the first intermediate 
that occurs. If in another region the same intermediate occurs for this value of A, we 
enter the same symbol, otherwise, we choose another symbol from the following list 
(given in order of occurrence): heart, club, diamonds, spade, star. In this way, it can be 
read off which primary intermediates are shared by the areas adjacent to the red area in 
Fig. [HI representing SV40. 
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Figure 9: Template for assembly of vertex-star models. 
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Table 7: Primary intermediates for SV4O and neighbouring regions of the phase space 



Glossary 



Assembly tree: representation of all possible pathways in the building-up of a viral 
capsid through the association of a single capsomer at a time; usually allows for ram- 
ifications or branches which ultimately converge when the final capsid is formed. A 
branchless tree is called a linear assembly tree. 

Capsid: protein shell that encloses the nucleic acid of a virus particle. Commonly 
exhibits icosahedral symmetry and may be itself enclosed in an envelope (although not 
in the case of SV40). The capsid is built up of protein subunits (integer multiples of 60) 
that self-assemble in a pattern typical of a particular virus. 

Capsomer: regular polygonal grouping of proteins seen on the surface of viruses, con- 
sisting of usually 3 (trimer), 5 (pentamer), or 6 (hexamer) of the capsid proteins. 

C-terminal arm: One end of the polypeptide chain of a protein is called the C-terminus 
(the other end being the N-terminus). C-terminal arms are bonds obtained by binding 
of the C-terminal arm extension of one protein to the polypeptide chain of another. 

Icosahedral symmetry: a body with icosahedral symmetry possesses a number of axes 
about which it may be rotated to give identical appearances. These are six 5-fold, ten 
3-fold and fifteen 2-fold axes of symmetry. 

Papovaviridae: family of DNA viruses including papilloma, polyoma and simian vacuo- 
lating virus (SV40). These viruses are small, non-enveloped and mainly infect mammals. 
They may be linked to some forms of cancer. 

(pseudo-)T = 7 capsid: capsid with a surface structure in which the locations of the 
capsomers follow geometries with T-number 7, but the capsomers do not exhibit the 
required symmetry properties, having fewer protein building blocks than predicted by 
the corresponding geometry with T-number 7. 

Quasi-equivalence: extent of similarity between structurally unique environments oc- 
cupied by chemically identical protein subunits in viral capsids. 

Tiling: partition of a space into a countable number of building blocks called tiles that 
cover the space without gaps or overlaps. It requires matching rules indicating how tiles 
are fitted together. These matching rules may be realised through decoration of tiles. 
Unlike Penrose tiles, which tile the plane aperiodically, the tiles used here provide a 
tessellation of a closed surface with icosahedral symmetry. 

T-number: triangulation number, which corresponds to the number of unique quasi- 
equivalent environments present in a given icosahedral surface lattice. It is given by 
T = h'^ + hk + k'^ where h and k are nonnegative integers and have no common factors, 
so that T is in the sequence 1,3,4,7,... SV40 corresponds to /i = 2, = 1, hence T = 7 
dextro (d) [the case /i = = 2 yields T = 7 laevo (1)]. The triangulation number 
classifies the geometries of triangulations with icosahedral symmetry and the terminology 
triangulation number refers to the fact that it can be interpreted as the number of 
triangles in the triangulation of one of the twenty triangular faces of the icosahedron. 
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Vertex star: in tiling theory, a set of tiles sharing the same vertex. Distinct vertex 
stars form the vertex atlas of the tiling. 

Viral Tiling Theory: A theory that encodes the surface structures of viral capsids 
in tessellations such that each tile in the tessellation represents a dimer- or a trimer- 
interaction between proteins located in the corners of the tiles. 
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